AN ESTIMATION METHOD FOR THE CHI-SQUARE 
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^ , Abstract. We propose a new definition of the chi-square divergence between 

^ ' distributions. Based on convexity properties and duality, this version of the 

is wsll suited both for the classical applications of the for the analysis 
, of contingency tables and for the statistical tests for parametric models, for 

which it has been advocated to be robust against inliers. 

We present two applications in testing. In the first one we deal with tests 
for finite and infinite numbers of linear constraints, while, in the second one, 
we apply x^— methodology for parametric testing against contamination. 
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1. Introduction 

The distance is commonly used for categorized data. For the continuous case, 
optimal grouping pertaining to the criterion have been proposed by various 
, authors; see f.i. [5], [18], [12]. These methods are mainly applied for tests, since 

, they may lead to some bias effect for estimation. 

I This paper introduces a new approach to the x^, inserting its study inside the 

range of divergence-based methods and presenting a technique avoiding grouping 
for estimation and test. Let us first introduce some notation. 

Let Ml denote the set of all probability measures on and M the set of all 
signed measures on R"^' with total mass 1. For P G Mi and Q e M, introduce the 
' X^ distance between P and Q by 

I oo otherwise. 
For a subset of M denote 

x'{n,p)= uii^x'{Q,P), (1-2) 

with inf{0} = oo. 

When the infimum in (|1.2p is reached at some measure Q* which belongs to ft, 
then Q* is the projection of P to il. Also the role of the class of measures M will 
appear later, in connection with the possibility to obtain easily Q* through usual 
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optimization methods, which might be quite difBcult when we consider subsets f2 
in Ml. 

For a problem of test such as Hq : P G f2 vs ffi : P ^ $1, the test statistic will 
be an estimate of x^(^^:-P), which equals under Hq, since in that case P — Q* . 
Therefore, under Hq, there is no restriction when considering fl a subset of M. 

The distance belongs to the so-called 0— divergences, defined through 

,P)^<; /^(h?)'^^ when g«p ^^^^^ 
X) otherwise 

where (/? is a convex function defined on M+ satisfying ip{l) = 0. This class of dis- 
crepancy measures between probability measures has been introduced by I. Csiszar 
[lOj . and the monograph by F. Liese and I. Vajda provides their main properties. 

The extension of 0— divergences when Q is assumed to be in M is presented in 
[S], in the context of parametric estimation and tests. 

The class of minimum 0— divergence test statistics include, within the others, the 
loglikelihood ratio test. 

For this class it is a matter of fact that first order efficiency is not a useful 
criterion of discrimination. A notion of robustness against model contamination 
is found in Lindsay [21] (for estimators) and in Jimenez and Shao [16] (for test 
procedures), which gives an instrument to compare the tests associated to different 
divergences. Although their argument deals with finite support models, it may help 
as a benchmark for more general situations. 

By these papers it emerges that the minimum Hellinger distance test provides a 
reasonable compromise between robustness against model contaminations induced 
by outliers and by inliers. 

However, when the model might be subject to inlier contaminations only (namely 
missing data problems), as will be advocated in the present paper for contamination 
models, then minimum x^— divergence test behaves better than both minimum 
Hellinger distance and loglikelihood ratio tests, in terms of their residual adjustment 
functions (RAF), because (we refer to [16] for the notation) 



ALRi-l) 



- < 1 and 



Ahd{-1) 



1 

4<^- 



Formula (|l.ip is not suitable for statistical purposes as such. Indeed, suppose 
that we are interested in testing wether P is in some class 51 of distributions with 
absolutely continuous component. Let X = {Xi, . . . , A"„) be an i.i.d. sample with 
unknown distribution P. Assume that P„ :~ ^X]"=i'^Xi' empirical measure 
pertaining to X, is the only information available on P, where 6x is the Dirac 
measure at point x. Then, for all Q S fl, the distance between Q and P„ is 
infinite. Therefore no plug-in technique can lead to a definite statistic in this usual 
case. 
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Our approach solves this difficulty and is based on the "dual representation " 
for the divergence, which is a consequence of the convexity of the mapping 
Q I — !• P): plus some regularity property; this will be set in Section[2l together 

with conditions under which P has a x^— projection on £7. We will also provide an 
estimate for the function which indicates the local changes induced on P by 
the projection operator. 

In some cases it is possible to replace i7 by n A„ where A„ is the set of all 
measures in M whose support is X , when this intersection is not void, as happens 
when f2 is defined for example through moment conditions. This approach is called 
the Generalized Likelihood paradigm (see [53] and references therein), and we will 
develop in Section 3 a complete study pertaining to such case when handling the 
divergence, in the event that f2 is defined through linear constraints, namely when 

r2 = |q e A/such that j f{x)dQ{x) = o| (1.4) 

for some R*^— valued function / defined on R'*. In this case the projection Q* has 
a very simple form and its estimation results as the solution of a linear system of 
equations, which motivates the choice of x^ criterion for tests of the form Hq : P € 
with SI as in ()1.4|) . As is shown in Section 3, by Theorem l2.2l the constrained problem 
is in fact reduced to an unconstrained one. 

Also for the problem of testing whether P belongs to Vl our results include the 
asymptotic distribution of the test statistics under any P in the alternative, proving 
consistency of the procedure, a result that is not addressed in the current literature 
on Generalized Likelihood. 

In Section 3 we will apply the above results to the case of a test of fit, where 
f2 = {Po} is a fixed p.m. 

When fin A„ is void some smoothing technique has been proposed, following [2], 
substituting P„ by some regularized version; see [23j . In those cases we have chosen 
not to make any smoothing, exploiting the dual representation in a parametric 
context. Section 4 addresses this approach through the study of contamination 
models, for a composite problem, when the contamination modifies a distribution 
with unknown parameter. 



2. The definition of the estimator 

2.1. Some properties of x^— distance. We will consider sets 51 of signed mea- 
sures with total mass 1 that integrate some class of functions $. The choice of <i> 
depends on the context as seen below. Let 

A/$ := |(5 e M such that j \ip\ d\Q\ < oo, for all G $| . (2.1) 

We first consider sufficient conditions for the existence of Q* , the projection of P 
on Q. We introduce the following notation. 
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Let $ = $ U Sf), where Bi, is the class of all measurable bounded functions on 
M**. Let T$ be the coarsest topology on M which makes all mappings Q i — > J (pdQ 
continuous for all G When $ is restricted to Bb, the topology turns out to 
be the usual r— topology (see e.g. [I3]). 

Assume that for all functions 1^9 in $ there exists some positive e with 

(f^+'^dP < 00. 

Whenever is a closed set in Af$ equipped with the t$ topology and x^i^jP) 
is finite, then, as a consequence of Theorem 2.3 in P has a projection in fl. 
Moreover, when fl is convex, uniqueness is achieved. 

In statistical applications the set fl is often defined through some statistical 
functional; for example, let ft defined as in (|1.4|) . In this case $ {/} and f2 
is closed by the very definition of $; therefore the choice of the class of functions 
$ is intimately connected with the set fi under consideration. As seen in Section 
4, and as developed in [9] also when is a subset of some parametric family of 
distributions, the class $ can be defined with respect to 51. 

We first provide a characterization of the x^— projection of a p.m. P on some 
set 51 in AI. 

Let V denote the domain of the divergence for fixed P, namely 

V= {Q e M such that x^iQiP) < 00} . 

We have (see P|, Theorem 2.6) 

Theorem 2.1. Let ft be a subset of M. Then 

(1) // there exists some Q* in 51 such that for all Q in finV, 



q* e Li{Q) and J q*dQ* < J q*dQ 



where q* = , then Q* is the x^— Projection of P on 
(2) // 51 is convex and P has projection Q* on 51, then, for all Q in 51, q* 
belongs to Li{P) and Jq*dQ* < Jq*dQ. 

Many statistically relevant problems in estimation and testing pertain to models 
defined by linear constraints (Empirical Likelihood paradigm and others). Section 
3 is devoted to this case. We therefore present a characterization result for the 
X^— projection on sets of measures defined by linear constraints. 

Let $ be a collection (finite or infinite, countable or not) of real valued functions 
defined on R'', which we assume to contain the function 1. Let 51 a subset of M be 
defined by 

51 = |q e M such that J gdQ = for aU 5 in $ - . 

Denote < $ > the linear span of $. 

We then have the following result (see [8]): 
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Theorem 2.2. (1) P has a projection Q* in D, iff Q* belongs to Q and for all 
Qen, q* e Li{Q) and J q*dQ* < J qdQ* . 

(2) If q* belongs to < ^ > and Q* belongs to Q, then Q* is the projection of P 
on fl. 

(3) // P has projection Q* on fl, the q* belongs to < $ >, the closure of $ in 

Remark 2.3. The above result only provides a partial answer to the characterization 
of the projections. Let P be the uniform distribution on [0, 1]. The set Mi{P) of 
all p.m.'s absolutely continuous with respect to P is a closed subset of M$, when 
$ := {x i-^- a;} U {a; i-^- 1}. Let n := {Q e Mi{P) : J xdQ{x) = \]. Then P has 
a projection on Vl and ^^(a;)l{g*>o}(a;) ~ cq + cix, with q* = The support 

of Q* is strictly included in [0,1]. Otherwise we obtain cq — ^ and ci — —3, a 
contradiction, since then Q* is not a probability measure. 

2.2. An alternative version of the x^. The distance defined on M for fixed 
P in Ml through x^iQ^ P) = J ^ dP is a convex function; as such it is the 
upper envelope of its support hyperplanes. The first result, which is Proposition 
2.1 in [9 , provides the description of the hyperplanes in M$. 

Proposition 2.4. Equip Af$ with the t^ — topology. Then M$ is a Hausdorff locally 
convex topological space. Further, the topological dual space of M$ is the set of all 
mappings Q ^ j fdQ when f belongs to < 4> > . 

Proposition 2.3 in [S] asserts that the distance defined on M for fixed P in 
Ml is l.s.c. in (Af$,T$). We can now state the duality lemma. 
Define on < # >, the Fenchel-Legendre transform of X^('i^) 

T(/,P):= sup f fdQ~x\Q,P)- (2.2) 

We have 

Lemma 2.5. The function Q i — > X^iQjP) admits the representation 

X\Q,P)^ sup ffdQ-T{f,P). (2.3) 
Standard optimization techniques yield 

T{f,P)^ J fdP+lJ fdP 

for all / e< * >, see e.g. [1], Chapter 4. 

The function /* = 2 — 1^ is the supremum in (|2.3p as can be seen through 
classical convex optimization procedures. 

We now consider a subclass 7^ in < # > and we assume: 

(CI) /* belongs to T. 
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Therefore 



X\Q,P)^ sup f fdQ-T{f,P) 



which we call the dual representation of the . 
This can be restated as follows: let 



Then 



m}{x) := j fdQ-(j{x) + \f{x)^ 

X\Q,P) = sup / mf{x)dP{x). (2.4) 



Hence we have 



X^{^,P)^ inf sup / mf{x)dP{x). (2.5) 

In the case when Q is defined through a finite number of linear constraints, say 

f7 = |g e Af : J f,{x)dQ{x) ^ai, l<i<k 

when P has a projection Q* on 17 and supp{Q*} is known to coincide with that of 
P, then we may choose as the linear span of {1, /i, . . . , fk} and (|2.5p turns out 
to be a parametric unconstrained optimization problem, since, by Theorem 12.21 (3) 

k / \ 

X\^,P)^ sup co + ^c,a, -T co + ^q/„P . 

co,ci,...,c, y .^^ J 

In some other cases we may have a complete description of all functions ^ when 
Q belongs to il. A typical example is when P and Q belong to parametric families. 

2.3. The estimator Xn- Let us now present the estimate of x^(^i ^)- 

Together with an i.i.d. sample Xi, . . . ,X„ with common unknown distribution 
P, define the estimate of X^iQ^P) through 

xl{Q,P) sup / mfix)dPn{x) (2.6) 



a plug- in version of (|2.4p . 

We also define the estimate of x^(f2, P) through 



xl{n,P) inf^ sup / mf{x)dP^{x). (2.7) 



These estimates may seem cumbersome. However, in the case when we are able 
to reduce the class to a reasonable degree of complexity, these estimates perform 
quite well and can be used for testing P e 51 against P ^ VI. This will be made 
clear in the last two sections which serve as examples for the present approach. 

In some cases it is possible to commute the sup and the inf operators in (|2.5I) . 
which turns out to become 

X^^.P) = sup inf / fdQ~T{f,P), (2.8) 
in which the inf operator acts only on the linear functional / fdQ. 



ESTIMATION WITH APPLICATION TO TEST OF HYPOTHESES 



7 



Also, when (|2.8p holds, we may define an estimate of x^{^, P) through 

Xl{n, P) = sup inf / fdQ - T{f, Pn). (2.9) 

When (j2.8p holds, it is quite easy to get the limit properties of x^. 
Indeed, by (g^ and (EH) 

xlin,P)-xH^,P)^ (s^PA^fo / fdQ-T{f,Pn)] - (sup inf / fdQ-T{f,P) 
Now define 

= i% J - ^) - i% / - / + i^O '''' 

a concave function of /. 

When J- is compact in a topology for which (jjp is uniformly continuous for all R 
in Ml, then a sufficient condition for the a.s. convergence of x^(il,P) to x^i^jP) 
is 

lim sup |(/>p„(/) - (/'p(/)| = a.s. 

n-)-oo ^gjp 

which in turn is 

lim sup 

This clearly holds when the class of functions {(/ + jf^) , / G J'} satisfies the 
functional Glivenko-Cantelli (GC) condition (see [25]). 

The limit distribution of the statistic x^(i7, P) under HI, i.e. when P does not 
belong to fi, can be obtained under the following hypotheses, following closely the 
proof of Theorem 3.6 in [7], where a similar result is proved for the Kullback-Leibler 
divergence estimate. 

Assume 

(C2) P has a unique projection Q* on 57. 

(C3) The class T is compact in the sup-norm. 

(C4) The class {/ + jf^, f G J^} is a functional Donsker class. 

We then have 

Theorem 2.6. Under HI, assume that (C1)-(C4) hold. The asymptotic distribu- 
tion of 

V^{xl{i^,p)-x'{n,p)) 

is that of Bp{g*), where Bp{-) is the P—Brownian bridge defined on T , and g* = 
2^ j^'^ 

Therefore -^71 (x^(57, P) — x^(^i-P)) has an asymptotic centered normal distri- 
bution with variance Ep (^{f* + lf*^f (X)^ - Ep ((-/* - i/*^) (X))"^ , where X 
has law P. 



.f+^f]dPn- 



/ + 1 dP 



= a.s. 
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The asymptotic distribution of Xn under HO, i.e. when P belongs to ft, cannot 
be obtained in a general frame and must be derived accordingly to the context. 

In the next Sections we develop two applications of the above statements. In 
the first one we consider sets defined by an infinite number of linear constraints. 
We approximate ft through some sieve technique and provide consistent test for 
Ho : P & ft- We specialize this problem to the two sample test for paired data. 
So, in this first application, we basically use the representation of the projection 
Q* of P on linear sets as described through Theorem 12.21 In this first range of 
applications we will project Pn on the non void set f2 n A„. 

The second application deals with parametric models and test for contamination. 
We obtain a consistent test for the case when $7 is a set of parametrized distributions 
Fe for 6* in e C M''. The test is 



HO: p en = {F0, e ee},i.c. x = o vs HI : p e {{i-x)Fg+XR, x^o, e ee} 



In this example we project P„ on a set of absolutely continuous distributions and 
we make use of the minimax assumption (j2.8p which we prove to hold. 

3. Test of a set of linear constraints 

Let T he a. countable family of real-valued functions defined on R'*, {ai}°^i a 
real sequence and 



We assume that il is not void. In accordance with the previous section we assume 
that the function /o 1 belongs to J- with gq — 1. 

Let Xi, . . . , Xn be an i.i.d. sample with common distribution P. 

We intend to propose a test for Hq : P £ fl vs Hi : P ^ fl. 

We first consider the case when is a finite collection of functions, and next 
extend our results to the infinite case. 

For notational convenience we write Pf for J fdP whenever defined. 

3.1. Finite number of linear constraints. Consider the set fl defined in (|3.ip 
with card{T} = k. Introduce the estimate of x^{fi, P) through 



Embedding the projection device in M n A„ instead of Mi n A„ yields to a simple 
solution for the optimum in p.2p . since no inequality constrains will be used. Also 
the topological context is simpler than as mentioned in the previous section since 
the projection of P„ belongs to R". When developed in Mi D A„ this approach 
is known as the Generalized Likelihood (GEL) paradigm (see [11]). Our approach 
differs from the lattest through the use of the dual representation (|2.7|) . which 
provides consistency of the test procedure. 




(3.1) 



inf x\Q,Pn). 



(3.2) 
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It is readily checked that 

The set n A„ is a convex closed subset in R". When the projection of P„ on 
n A„ exists uniqueness therefore holds. In the next section we develop various 

properties of our estimates, which are based on the duality formula (|2.6p . 
The next subsections provide all limit properties of Xn(^j 



3.1.1. Notation and basic properties. Let Qo be any fixed measure in 57. By (j2.5l 

x'(r!,p)= sup {Qo-P)f-]pf 



ao 



sup ^a.i{Qn~ P) -P \^aifi + ao\ 
'^i. ,=1 * \^=l J 



(3.3) 



since, for Q in and for all / in Qf — Qof and 



xl= sup ^ai{Qo - Pn) fi - ^Pn [^aifi + ao 



The infinite dimensional optimization problem in (|2.5p thus reduces to a (A; + 
1)— dimensional one, much easier to handle. 

We can write the chi-square and Xn through a quadratic form. 
Define the vectors e by 

- l^i^, PnY - {{Qo - P„) /l, ■ • ■ , [QO - Pn) fk} (3.4) 

= Py = {{Qo - P) /i, . . . , {Qo - P) fk} 

In = iJ-^) = ^{(^" -P)h,---, {Pn - P) fk} = (i^ - i^J . 

Let S be the covariance matrix of 7^. Write Sn for the empirical version of S, 
obtained substituting P by P„ in all entries of 5. 

Proposition 3.1. Let 57 he as in (jff. j[) and let card{F} he finite. We then have 

(i) xl = KSn^Kn 
(ii) x^ i^, P) = v!S-^v 

Proof. (i) Differentiating the function in p.3p with respect to a^, s = 0, 1, . . . , fc 
yields 

k 

ao = -^aiP,J, (3.5) 

i=l 

for s — 0, while for s > 



(Qo - Pn) fs = ^ (^aoPnfs + ^ aiPnfif^ . 



(3.6) 



Substituting p.Sp in the last display, 

k 

2 



1 ^ 

(Qo — Pn) fs — -Z ^ flj {Pnfifs — PnfsPnfi) : 



2 = 1 



10 



M. BRONIATOWSKI^ AND S. LEORATO^ 



"^ILn = SnO, (3.7) 

where a' = {ai,a2, . . . ,afc} . 

Set /* = argniax<^>(Qo - Pn)f ~ \PnP- For every h e< F >, 
{Qo-Pn)h- ^,Pnhf* = . Set /i /* to obtain {Qo - K) f* - 

Ip (f*^2 
2 n\JnJ ■ 

It then follows, using e ((XT)) , 



(Qo-p„)/:-^p„(/:)' 



(A; \ ' 

i=l 1=1 / 



*\2 



iPnifn) 



(ii) The proof is similar to the above one. 



□ 



3.1.2. Almost sure convergence. Call an envelope for J- a function F such that 
I/I < F for ah / in F. 

Theorem 3.2. Assume that i^iP) is finite. Let F be a finite class of functions 
as in i3.1]) with an envelope function F such that PF^ < oo. 
Then \xl-X^ i^, P)\~^0, P- as. 



Proof. From Proposition 13. ![ 

For X in R*^ denote ||a;|| the euclidean norm. Over the space of matrices k x k 
introduce the algebraic norm |||^||| = sup||^||<j^ ^ ^^P||a;||=i ll^^ll- entries of 
A satisfy \a {i,j)\ < |||A||| . Moreover, if |Ai| < IA2I < . . . < |Afc| are the eigenvalues of 
A, \IA\1 — \Xk\ . Observe further that, if for all {i,j) , \a {i,j)\ < £, then, for any x G 
E^ such that 11x11 - 1, \\Axf = Ell {E,<^ihj)x,y < « jf kll' < 

k^e^, i.e. \IA\1 < ke. 

For the first term in the RHS of the above display 

A := (5-1 - 5-1) = .,/5-i/^ (51/^5-151/^ - /) 5-i/V„ 



< 



/C-1/2 rtl/2 rt-l 01/2 _ J 



< cost, k 



Hence if B |||5i/25'-i5'i/2 _ /||| tends to a.s., so does A. 
First note that 

5-1 = (5 + 5„ - 5)"i = 5-1/2 fj ^ ^-1/2 



-1/2 



5-1/2 



5-1/2 



/ + ^(5-i/2(5-5„)5-i/2) 



/i=i 



5-1/2. 
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h=l 



Hence 

which entails 
5I/25-I5I/2 - / 



/2 



^(5-1/2 (5 -5„) 5-1/2 



h=l 



< 



5-1/2 



2ft 



h=l 



= Op i/c sup |s„ (i, j) - s , 

where Ai is the smallest eigenvalue of 5. 
Since 

C sup |s„ {i, j) - s (z, j) I < sup I (P„ - P) /J, I + sup I (P„ - P) /,| | (P„ + P) P| 

^d I'd * 

(3.8) 

the LLN implies that C tends to a.s. which in turn implies that B tends to 0. 

Now consider the second term. \v_n' S^'^v_,-^ — i^'5-it^| — (i^„ + v)' 5-1 (^"-^^^^7„) 
tends to by LLN. □ 

3.1.3. Asymptotic distribution of the test statistic. Write 
We then have 

Theorem 3.3. Let Q be defined by (),?. and J- be a finite class of linearly inde- 
pendent functions with envelope function F such that PF^ < 00. Set k ~ card{J^}. 
Then, under HO, 

n-xl chi (k) 

where chi{k) denotes a chi-square distribution with k degrees of freedom. 

Proof. For P in Vl, V^zy„ = 7^. Therefore nxl = 7„'S'"W„+\AiJ£n' [Sn^ - 5-i) 

By continuity of the mapping h (y) — y'S~^y , has a limiting chi{k) 

distribution. 

It remains to prove that the second term is negligible. Indeed again from 

(V^l^n)' - 5-I) (V^].„) < est. k III5I/25-I5I/2 - / 

it is enough to show that |||5i/25-i5i/2 - /||| is op (1) . This follows from □ 
The asymptotic behavior of x'^ under HI is captured by 

ixl - X') = -2i: 5- V + ^^^i,'5-l/2 (5I/25-I5I/2 - /) 5-1/2^ 

- 27l^"'^' (51/25-151/2 - /) 5-1/2^. + n-1/27; 5-i7„. (3.9) 
This proves that the test based on nXn is asymptotically consistent. 
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3.2. Infinite number of linear constraints, an approach by sieves. In various 
cases is defined through a countable coUcction of hncar constraints. An example 
is presented in Section 3.3. Suppose thus that fl is defined as in (j3.1l) . with J- an 
infinite class of functions 

T = {fa -.W^ ^R, aeA} 

where A C M is a countable set of indices and card (J-) ~ card (A) = oo. Thus 
n = {Q eM : Qf = Qof, f e T}, for some Qo in M. 

Assume that the projection Q* exists in 51. Then, by Theorem [ 



We approximate through a suitable increasing sequence of classes of functions 
J-n , with finite cardinality k = k{n) increasing with n. Each J-n induces a subset 
r2„ included in 51. 

Define therefore {-7^n}„>i such that 

C Fn+i C F, for aU n>l (3.10) 

F=[jFn (3.11) 

n>l 

and 

a. = {Q : g/ = Qo/, /e^4. 

We thus have il„ I) 0„+i, n > 1 and 51 = P| 51„. 

The idea of determining the projection of a measure P on a set 51 through an 
approximating sequence of sets -or sieve- has been introduced in this setting in [2 7) . 

Theorem 3.4 (TebouUe-Vajda, 1993). With the above notation, define Q* as the 
projection of P on 51„. Suppose that the above assumptions on {51n}„>]^ hold and 
that 51„ D 51 for each n > 1. Then 



lim II/* - /*|Il,(p) = lim 



dQ* 


dQl 


dP 


dP 



(3.12) 

Li{P) 



By Schefi^e's Lemma this is equivalent to lim„_>.oo dvar{Q*m Q*) — where d^ariQ, P) 
sup^gg(jjd) \Q{A) — P{A)\ is the variation distance between the p.m's P and Q. 
When sup^gjrsupj, f{x) < oo then (|3.12p implies 

lim x'(f^„,P) =x'(^^,^) (3.13) 
The above result states that we can build a sequence of estimators of (^: P) 
letting k — k{n) grow to infinity together with n. Define 

= sup (go - Pn) f - ^Pnf. 

In the following section we consider conditions on k{n) entailing the asymptotic 
normality of the suitably normalized sequence of estimates Xn,k when P belongs to 
ri, i.e. under HO. 
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3.2.1. Convergence in distribution under HO. As a consequence of Theorem 
k tends to infinity with probability 1 as n — > oo. 
We consider the statistics 

(3.14) 

which will be seen to have a nondegenerate distribution as k{n) tends to infinity 
together with n. 

As in 14J and [15 , the main tool of the proof of the asymptotic normality of 
(j3.14p relies on the strong approximation of the empirical processes. We briefly 
recall some useful notions. 

Definition 3.5. A class of functions T is pregaussian if there exists a version Bp (.) 
of P— Brownian bridges uniformly continuous in £°° (J^), with respect to the metric 

1 /2 

Pp {f,g) — {Varp \f ~ g\) , where £°° (J^) is the Banach space of all functionals 
H : T ^ M. uniformly bounded and with norm ||i?||jr = supy^jr \H (/)| . 

For some a > 0, let 5„ be a decreasing sequence with (5„ = o{n^"'). 

Definition 3.6. A class of functions is Komlos-Major-Tusnddy {KMT) with 
respect to P, with rate (S„ G KMT (i5„; P)) iff it is pregaussian and there exists 
a version P° (.) of P— Brownian bridges such that for any t > it holds 

Pr|sup|V^(P„-P)/-P,^(/)| >(5„(t + &logn)| <ce'^\ (3.15) 

where the positive constants h, c and 9 depend on J- only. 

We refer to [3], [52], [5], and [T^ for examples of classical and useful classes of 
KMT classes, together with calculations of rates; we will use the fact that a KMT 
class is also a Donsker class. 

From p.l5p and Borel-Cantelli lemma it follows that 

sup|7n(/)-S51(/)| =0(J„logn) (3.16) 

a.s. where, with the same notation as in the finite case (see p. 41) ). 7„(/) = \/n{Pn — 
P)f is the empirical process indexed by / G J". 

Let {-^n}„>i be a sequence of classes of linearly independent functions satisfying 

For any n, set 7^^^ — j^iJ'n) (resp. B^nk) the /c— dimensional vector result- 
ing from the projection of the empirical process 7„ (resp. of the P— Brownian 
bridge -Bj^J defined on F to the subset Fn- Then, if Fn ~ |/}"\ . . . , /^"^|, 

De- 



In,, = {7n(/!"^.-.,7n(/f^)} audP?,, = {p,", (/I") ),..., B° (/f) )} . 

note Sk the covariance matrix of the vector 7^^ ^ and Sn.k its empirical covariance 
matrix. Let Ai jj be the smallest eigenvalue of Sk 
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Theorem 3.7. Let J- have an envelope F and be KMT{Sn;P) for some sequence 
Sn i 0. Define further a sequence {-^ri}„>i of classes of linearly independent func- 
tions satisfying l\3.10\\ . 
Moreover, let k satisfy 



lim k{n) — oo 

n— >-oo 

lim X:l''^k^'^5n\ogn^Q 

n— i-oo ■ 

lim A7j.fc3/2n-i/2 =0. 



(3.17) 
(3.18) 



Then under HO 



Proof. By Proposition 13.1 



/2k 



TV (0,1). 



k {Bl,yS-'Bl,~k 



2k 



2{2kr'/'{Bl,yS^'{2n.k-Bl 



+ (2fc)-^/S„/(5-i-5-)7„ 
= A + B + C + D. 



The first term above can be written 



A = 



(S-nM) ^k ^^n,k ^ 



k 

1 

1=1 



E{zf-Ezf) 



V2k ^kVarZ'f 

which converges weakly to the standard normal distribution by the CLT applied to 
the i.i.d. standard normal r.v's Zi. 

As to the term C it is straightforward that C — o{B). From the proof of Theorem 



D goes to zero if \\k'^/'^ (s^Pij \snM («, j) - Sfc (i, j)l) 



i u^k 

-L71 , k ^ 



-1/2 



= 0P(1). 

Since, using p.lSp and p.lOp . sup^ |s„,fc (i,j) - Sk (i,j)| < supj ^g^ |(P„ - P)fg\ 
+ snp f^jr |(P„ - P) f\ |(P„ + P) F| = Op (n-^/^), and considering that -/^ ^S^ 



-1/2 



Op{k), we are done. 
For B, 



< 



\ln,k — " 



1/2 r.-l/2 



{ln^k~Bl 



-1/2 



s 



-1/2 



{ln.k~Bi 



where, as used in A, Zf are i.i.d. with a distribution with 1 df. Hence 



Eti Zf = Op (fci/2) . Further 



,-1/2 



{ln,k — 



< 



< 



\2.n,k — 



<A7f sup|7„(/)-P°(/)| 
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from which B = Op [x^^J"^ k^/^5n\ognj ^ op (1) if (fSTT]) holds. We have used 
the fact that P belongs to VL in the last evaluation of B. 



□ 



Remark 3.8. Under iJl, using the relation = 



7-1/2 



7 , , we can write 

— n,/c' 



= {2k)-'/^ (niyS-lv-2V^±^^S-lu) + Op(l) 
where the Op(l) term captures {2k)'~^/'^ fc*^" fe'>'„ t 



k that coincides with the 



test statistic "^"^ under iJO. We can bound the first term from below by 



/2k 



(2fc)-i/2„iy'5fe V (1 - Op 



^Op -Op(ni/2). 



£^1/2^,-1 ^1/2 _ J 



'c-1/2 



1/2 _ 

1 + Op 



c.1/2 g-l c.1/2 
'-'fc ^n,k^k 



Hence, if p.l7p and p.lSp are satisfied then the test statistic is asymptotically 
consistent also for the case of an infinite number of linear constraints. 

In both conditions p.l7p and p.lSp the value of Ai_fc appears, which cannot be 
estimated without any further hypothesis on the structure of the class J-. However, 
for concrete problems, once defined it is possible to give bounds for Ai, depending 
on k. This is what will be shown in the last section, for a particular class of goodness 
of fit tests. 

3.3. Application: testing marginal distributions. Let P be an unknown dis- 
tribution on M'* with density bounded by below. We consider goodness-of-fit tests for 
the marginal distributions Pi, of P on the basis of an i.i.d. sample {Xi, 

Let thus Q5i • • ■ 7 denote d distributions on R. The null hypothesis writes HO : 
Pj = Qj for j ~ l,...,d. That is to say that we simultaneously test goodness-of- 
fit of the marginal laws Pi, . . . , P^j to the laws Q", . . . , Through the transform 
P'(yi, ...,yd) = P {^{Qi) ^ (yi) , ■ • • , (Qd) ^ {yd)j we can restrict the analysis to 
the case when all p.m's have support [0,1]'' and marginal laws uniform in [0,1] 
under HO. So without loss of generality we write Qq for the uniform distribution 
on [0,1]. 

P.J. Bickel, Y. Ritov and J. A. Wellner [3l focused on the estimation of linear 
functionals of the probability measure subject to the knowledge of the marginal 
laws in the case of r.v.'s with a.c. distribution, letting the number of cells grow to 
infinity. 



Define the class 



[0,1] 



{0,1}, j-1. 



[0,1]} 



where l^j (xi, , 



,Xd) 



Xj < u 
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Let il be the set of all p.m's on [0, 1]'' with uniform marginals, i.e 

r2 = |q e Ml ([0, 1]'') such that Q/ = ^ f{x)dx, / G J"| . (3.19) 

This set ft has the form of (|3.1I) , where J- is the class of characteristic functions 
of intervals, which is a KMT class with rate (5„ = n^^/^'' (5„ = ^/n if d = 2) ; see 

i- 

We now build the family satisfying p.lOp and p.ll|) . 

Let m = m(n) tend to +oo with n. Let < ui < . . . < u„i < 1 and {W*^"-*} be 
the m • d points in [0, l]*^ with coordinates in {ui, . . . , Um}- 

Let J-n denote the class of characteristic functions of the d— dimensional rectan- 
gles [0, u] for u e U'^^\ Hence cardlTn} ~ k = m ■ d. 

Namely, 

Tn = : [0, 1]'' {0, 1} , j = 1, . . . ,d, me (0, 1) , Ui < m+i , i ^ 1, . . . ,m| , 

(3.20) 

which satisfies Tn Q J^n+i for all rt > 1 (i.e. p.lOp ') and T = lJri>i -^n P-HP ^- 
The sequence {-^n}„>i and the class satisfy conditions of Theorem 13.71 

(and consequently each J^n) has envelope function F = 1 and RF'^ = 1, for all R in 

Ml ([0, l]"*) and h in R. 

In order to establish a lower bound for Ai.fc, the smallest eigenvalue of Sk, we will 

(n) 

impose that the volumes of the cells in the grid defined by the ul do not shrink 
too rapidly to 0. Suppose that the intervals (ui,Ui+i] are such that 

< lim inf min k (wi+i — Ui) < lim sup max k (u^+i — Ui) < oo. 

n— >-oo i=l,...,m — 1 n— foo i=l,,,,.m— 1 

(3.21) 

Remark 3.9. Condition for the sequence J>i to converge to coincides with (F2) 
and (F3) in [3]. 

We first obtain an estimate for the eigenvalue Ai_fc. The final result of this step 
is stated in Lemma [3.111 below. 

Let P belong to il. Let us then write the matrix Sk- We have Plmj — Qo^uij ~ 
Ul for i = I, ...,171 and j = l,...,d . Set Pluij'^ui,h = P{Xj <Ui, Xh <ui), 
for every h,j = l,...,d and l,i = l,...,m. When j = h then Plmj^uij — 
P {Xj < Ui Aui) = Ui Aui. 

Consider for the vector of functions fj the following ordering 

(/l 1 ■ ■ ■ 1 /mi /m+1 7 ■ • ■ 1 /2m 1 • ■ ■ 5 /(d— l)m+l i ■ ■ ■ 7 fdni: ) (l^i.l ■ ■ ■ ; ^Um,l^ l'Lti,2i ■ ■ ■ ; ^Um-d^ ■ 

The generic term of Sk writes 

Sk iu,v) = Sk ((j - l)m + i, {h - l)m + I) ^ Plu.jKi.h - P^u^.jP^ui.h 

'li — uf , if j = h, i = I 

= ^ P{Xj <Ui A Ul) - UiUi , if j = h, i ^ I 

P{Xj < Ui,Xh < Ul) - UiUi {i, I) - pipi if j ^ h 
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We make use of the class of functions 

= {fji - f](t-l) ' j = 1, ■ ■ • , m, j = 1 . . . , d, fhE J^n, /o = 0} 
= i = 1 . . .,m, j = 1,. . . 

In the above display the m describe the partition of [0, 1], the support 

of the marginal distribution Pj, induced by the vector {ui}j<,„- Namely we have 
for every j ^ 1, . . . ,d, D A\ =%,i^l, yJT=x A\ - [0, 1], with A\^^^^ = K„, 1], 
for all j. 

Set Sl the covariance matrix of the vector 7'' = 7 (J^^) and consider the vectors 

^ —n —n ^ ' 

and vl, defined as in (13.41). 



5^ has ((j — l)m + i, (/i — V)ra + th component equal to Pj^iA'i ~~ ^a^^a^ 
which is 

Pi -Pii if j = h,i^l 

-PiPi, if j ^ h,i^l 

P(ui_i < Xj < Ui, ui^i < Xh < ui) - p.,pi, if j 

where we have written pi = P(ui_i < Xj < Ui) ^ Ui — Ui-i, for all = 1, . . . , d. 
(and xii) can be written using Ff^ instead of Fn'- 



Let M be the diagonal block matrix with all diagonal blocks equal to the unit 
inferior triangular (m x m) matrix. Then v_ — M]/ . 

On the other hand, after some algebra it can be checked that Sk — MS^M' . 
Thus ]/\sly^]/ = i/iM'y^ M' S^^ M{M)-^v = ■ Similar arguments yield 

The matrix M has all eigenvalues equal to one. This allows us to write, for Ai,5 
the minimum eigenvalue of Sf. : 



. x'SkX . y'Siy \\MxP , . y' Sty U'r^xW^ , 

Ai.fe < mm < mm max ,, = Xi^s < mm max — — — = Ai^^. 

X \\x\\^ y \\y\\ ^ IfII y \\y\\ ^ IfII 

We will now consider the covariance matrix of 7^ under HO, when the underlying 
distribution is Qo, i.e. the uniform distribution on [0, 1]''. Denote this matrix S'^. 
We then have 

Lemma 3.10. If P eil, then 

(i) — D^^^(I — V)D^/^, where D and V are both diagonal block matrices 
with diagonal blocks equal to diaglpi}^^^ ^ and to U = {^JPiPi} „ ,„ 
respectively. 

(ii) The (to x m) matrix U has eigenvalues equal to 



Moreover {I - U)-^ ^ (I + ^^^U) 



(1 — Prn+i) — '^"LiPi with cardinality 1 
with cardinality to — 1 . 
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(Hi) For any eigenvalue A of it holds 

Pm+i min Pi < X < max pi. 

l<i<m l<i<m 

Proof, (i) This can easily be checked through some calculation. 

(ii) First notice that 

[/2 = (l-p,„+i)[/ (3.22) 

Formula p. 221) implies that at least one eigenvalue equals (1 —Pm+i)- On the other 
hand, summing up all diagonal entries in U we get trace{U) — Pi = 1 ^Pm+i- 
This allows us to conclude that there can be only one eigenvalue equal to 1 — Pm+i 
while the other must be zero. 

For the second statement, by Taylor expansion of (1 — x)^^, (I — U)^^ = I + 

Then, using recursively (I-U)-^ = I +U Y^h^ii^-Pn^+if = ^+^^^- 

(iii) For any eigenvalue A of 5^ we have: 



A < Afc,fc 



1 5° III < 



£,1/2 



(/ — y)||| = max Pi ( 1 — inf x'Fx I — max pi 

l<i<m \ l|a;l|=l / l<i<m 



where for the last identity we have used the fact that the eigenvalues of V coincide 
with the eigenvalues of U with order multiplied by d. 
For the opposite inequality consider 



si 



max p- 

l<i<m 



< lD-^\ 
1 + ^ 



1 



Pm+l 



1 + 

Pm+l 
(l-pm+l) 



-V 



mm p., 

l<i<m 



Pm+l- 



□ 



Lemma 3.11. Suppose that P has density on [0, 1]'' bounded from below by a > 0. 
Then the smallest eigenvalue of S^, and consequently Xi^k, is bounded below by 
Pm+l mmi<i<mPi- 

Proof. Write {u,v) for the {u,v) — th element of Sf.. We have, for P e il, i.e. if 
Pf - Qof: for every / e J^^: 

si ((j - l)m + i,ih- l)m + 1)^ si (u, v) = Pfuf, - PfuPfv = 

= P ifu - Qofu) ifv - Qofv) = P (JJ) 

where fu — fu — Qofu- For each vector a G R^ ™ it holds then 



2 

dm dm / dm \ 

a'S^a = X! X! o-uavP {fufv) ^ P ) 

u—l v—1 \u—l / 

„ / dm r / \ ^ 

= / aufu ] dP> a Oufu dQo 

= agf {Qq {fu - Qofu) ifv - Qofv)}u,v^^ ag[Sla 
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On the other hand the preceding inequahty imphes 

. r a'Sia ^ . ^ g!Sla 

mf V > a mf ^~ (3.23) 

^ llfill ~ a llfill 

that is a lower bound for the smallest eigenvalue of S'| depending on the smallest 
eigenvalue of 

Apply Lemma FS.lOl fiii) to get the lower bound for Ai. □ 

Remark 3.12. Existence of a > such that the density of P in [0, 1]'' is bounded 
below by a seems necessary for this kind of approach; see assumption (P3) in [3]. 

From Theorem 13.71 and using p.21|) in order to evaluate Pm+i miiii<i<m Pi, to- 
gether with the fact that the class is KMT with rate (5„ = n^^/^ we obtain 

Theorem 3.13. Let \3.21\l hold. Assume that P belongs to defined by i3.19\) 
and has a density bounded below by some positive number. Let further k = d ■ m(n) 
be a sequence such that lim„_>.oo k = oo and lim„_>.oo k'^^'^n^^^^ = 

Then — — = "' — has limitinq normal standard distribution. 

In the last part of this Section we intend to show that conditions in Theorem 
13.131 can be weakened for small values of d. When d — 2 the rate for k — 2m is 
achieved when condition (13.171) holds. 

We consider the case when d = 2; for larger values of d, see Remark l3.16l 

In order to make the notation more clear, define pi,j and Nij, respectively, 
P{Al X Aj) and nPniA} x A'^), where the events , h = 1,2, i — I, . . . ,m are as 
above. The marginal distributions will be denoted pi^. = p-s = Pi (since HO holds), 
and the empirical marginal distributions by Ni,./n and N.,i/n. 

Turning back to the proof of Theorem 13.71 we see that condition p.lSp is used 
in order to ensure that 7^^ k^^n \ ~ ^k^^ln k S'^^^ probability as n tends 

7' 5^ ^7 —2m 

to infinity, while condition p.l7p implies the convergence of =^ — ^4m'' 

standard normal distribution. 
Let 

r m+1 

Q^lqe Afi([0, 1]^) : J2 = P-^J = = Uj+i -Uj, J ^l,...,m + l; 
[ 1=1 

m+1 ^ 

X! = K,- = qt = Ui+i -Ui, i = 1, . . . , m + 1 |. , 
i=i J 

where ql^ = Q^iAj x A'j) = (u^+i - Ui){uj+i - Uj). 
Lemma 3.14. When P £ Q, it holds 

m+1 m+1 / N 2 

2 _ . (ngij - NijY 



n-Xn.k = min 



1=1 j=i 



y , = min V V ^^'^-^ ' (3.25) 

-Ln,k -Ln,k q^q ^ ^ 
i—1 j—1 
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Proof. We prove p.24p . since the proof of (|3.25p is similar. Following [3] the RHS 
in ((XMl) is 

m+1 m+1 

where the vectors a and b G ]R'"+^ are solutions of the equations 

N,. N,. TV, 

Oi — '■ > — -, z = l,...,m + l, 

n n ^ — ' n 

, N. , N. , N,, 

bj — 1^"-^ — ^ J = l,...,rn + l. 

i=l 

Let a = (5,1, . . . , dm, ^i, ■ • ■ , be the coefficients in equation p.7p . Making use 
of equations p. 51) and (I3.6P we obtain, using the class in place of J-n in the 
definition of Xn,k> 

fli = 2 (ci - a„j+i) , i = 1, . . . , m (3.26) 
6j 2 (6j - 5„,+i) , j = 1, . . . , TO 

ao = 2 (flm + i + 5m+i) . 

From the proof of Proposition 13.11 we get, setting Sij = 1 for i = j and 
otherwise, 

2 ^ ' o 

Xn,k — On^ka 

= — E E \a^aJ {N^,6r,j - N^,.NjJn) + kbj {N.^A,j - N.^^N.^j/n) 
which, using p.26p and after some algebra yields 

□ 

We now can refine Theorem 13. 131 

Theorem 3.15. Let I13.21\) hold. Assume that P G 51 satisfies the condition in 
Lemma \3.11\ for some a > 0. 

Let m{n) be such that lim„_>.oo m = oo and lim„_>.oo TO'^/^n^^/^ logn = 0. 

Then, under HO, 



'4to 



iV(0,l) 



Proof. It is enough to prove = op(l). 

Denote P and P the minimizers of p.24p and p.25p in Q. Let pi j and ^ denote 
the respective probabilities of cells. 
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We write 

m+l m+1 



J-n , /c ^ — n , /c 



< max — — 1 



and 



m+l m+l , . , \2 / 

"X„.. - 7„,,5, > mm ^ ^ — - 1 



> — max 

i,3 



npi. 



Whenever 



TO max 



4 (3.27) 



holds, then the above inequahties yield ^^4^ ^ ^" ^ ^ "~^^^^^"^r~^ 

/ 7' S^S -2m „ \ 

Op ( =^^^^ — ^'Zn' V™ ^ "'^ ] ~ ' ■^hich proves the claim. 

We now prove (|3.27l) . We proceed as in Lemma 2 in jB], using inequalities (10.3.2) 
in [26]. Let B„ - Bin{n,p). Then, for t > 1, 

Pr( —>t\ <exp{-nph(l/t)} and Pr ( — > t] < exp {~np h (t)} , 
\Bn ) \np J 

(3.28) 

where h (t) = tlogt — t + 1 is a positive function. 
Since Ni_j ~ Bin{n,pij), 

f / \ 4. -1 m+l m+l f N 

pJmaxfl^-l')>^|<EE^4^>^ + 4 

m+l m+l 

^ E Ee^p{-"p^'j^(v(i+vv^))} 

(by p. 211) and byp,; , > api .p. ,) < (to + 1)^ exp \ -ca—^(logn)m^'^h (l/ (l + t/y/m)) 

[_ log n 

For X — 1 + e, h{x) = 0{e^). Therefore, using p.l7p with k = 2m, for every 
M > there exists n large enough that 



n 

ac- TO ,„ . ^ , — 

log n \ 1 + t/y/m 



and consequently Pr |maxij- ^^^^ — 1^ > *° 

To get convergence to zero of Pr|max,,j (l - ^) > ^} =Pr|max,j ^ > ^^,)^ }, 
the second inequality in p.28p is used in a similar way. □ 

Remark 3.16. The preceding arguments carry over to the case d > 2 and yield to the 
condition lim„ m'^^^/^n^^^^ logn = 0. However for d > 6 this ultimate condition is 
stronger than p.lSp . 
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4. Application: a contamination model 

Let Vg an identifiable class of densities on R. A contamination model typically 
writes 

p{x)^(l-\)fe{x)+\r{x) (4.1) 

where A is supposed to be close to zero and r{x) is a density on R which represents 
the distribution of the contaminating data. 

An example is when fg{x) = 0e~^^, a; > and r{x) is a Pareto type distribution, 
say 

r{x) := r^^^{x) = -fv^ {x)-^'^+^\ (4.2) 

with x > V and 7 > 1, v > I. 

Such a case corresponds to a proportion A of outliers generated by the density 

We test contamination when we have at hand a sample Xi, . . . , A„ of i.i.d. r.v.'s 
with unknown density function p{x) as in (j4.ip . We state the test paradigm as 
follows. 

Let HO denote the composite null hypothesis A = 0, i.e. 

HO: p{x)^feM. ^0 e 6 
versus 

HI : p{x) = (1 - \)fg{x) + \r{x) 

for some 9 ^ Q and with A 7^ 0. 

Such problems have been addressed in the recent literature; see [IPj and references 
therein. We assume identifiability, stating that, under HI, A, 9 and r are uniquely 
defined. This assumption holds for example when fe{x) — 9e'^^ and r(x) is like in 
63). 

For test problems pertaining to A we embed p{x) in the class of density functions 
of signed measures with total mass 1, allowing to belong to Aq an open interval that 
contains 0. 

In order to present the test statistic, we first consider a simplified version of the 
problem above. 

Assume that 9q — a is fixed, i.e. 6 — {a}. We consider the hypotheses 

HO : pix) = U{x) 
versus 

HI : p{x) = (1 - A)/„(.t) + Xr{x), with A 7^ 0. 

In this case ft — {fa} and the null hypothesis HO is simple. 

For this problem the approach appears legitimate. From the discussion in 
Section [1] the criterion is robust against inliers. A contamination model as (|4.ip 
captures the outlier contamination through the density r. As such the test statistic 
does not need to have any robustness property against those, since they are included 
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in the model. At the contrary, missing data might lead to advocate in favour of HI 
unduly. Therefore the test statistic should be robust versus such cases. 
By the necessary inclusion /*=2^^^ — 1^ eJ^we define 

J" = J'a = = 2 ^^j— ^^y^-p-^ - 1 j such that y" Igl/a < cx), A e Aoj . 

(4.3) 

Following (HH) 

xHUp)^ snp [ gU-Tig,Pn). (4.4) 



Example 4.1. Consider the case fa{x) = ae~"^ and r{x) — ^v'' (x)^'^'^^^^ for some 
7 fixed, X > V. 

Then = {2 ( (i,,)^-^^,,^.. - l) , A € A„ such that / < oo} . 

Let us now turn back to composite hypothesis. 
Let be defined by 

n^{q[x)^ U{x),aeQ}. 

We can write 

/■ 



J-a = {g{e,\a) = 21 

and 



(1 - \)fe + Ar 



l) : j |5l/a<oo,AGAo,^?ee| 



x\n,P) = mf sup / gU^Tig^P). 



The supremum is to be found over a class of functions J-a which changes with a. 
Denote the subset of (8, Aq) which parametrizes J-a- 

Example 4.2 (Continued). We assume O = [a, a], which corresponds, in our exam- 
ple, to the restriction of the expected value of P (under HO) to the finite interval 
f= -] 

I- a ' a -I 

Therefore 

xl{n,P) = inf sup / 2 f , . . - l) ae-"-da;-T(5(0,A,a);P„). 

(e,A)GA„ J \{l-X)Oe + Ar^(a;) / 

(4.5) 

The supremum in (j4.5p is evaluated over a set which changes with a. 
In accordance with the discussion in Section [2] we may define 

J- = {g{e, A, (3) = 2( - l) : / (T^^^^^rf^ < c^, («, 0, /?) e 9^, A e Ao} 

C {g(^,A,/3) : Ae Ao, (9,(3) eT}, 

(4.6) 

a class not depending upon a. 

The resulting test statistic would be then 

xl{n,P)=M sup [ g{e,\l3)ae-^-dx-T{g{e,\p)-P,,) (4.7) 
and the supremum in ()4.7p is determined on a set that does not depend on a. 
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The use of (|4.5|) is proposed by M. Broniatowski and A. Keziou |^. Also in 
our context it is easy to see that (|4.5p is preferable to (|4.7p . in the sense that it 
reduces considerably the computational complexity of the problem, from a subset 
of {{9, A, /3) e e X Ao X 9} to a subset of {(A, 6*) e Aq x 9}. 

We first derive the asymptotic distribution of the test statistic Xn under HI; in 
order to use Theorem 12.61 we commute the inf and sup operators in (j4.5p through 
the following Lemma l473l 

Assume 

(Al) 9 is compact. 

(A2) For all a in 9, Aq, is compact. 

Condition (A2) is verified in our example due to the compactness of the interval 
9 and to the distribution of the outliers. 

Lemma 4.3. Let 

Under (Al) and (A2), 

inf sup f gie,,X,02)U-T{g{e,,X,92);P) (4.8) 

sup inf f g{9uX,92)U-Tig{0,,X,e2);P). 

(6ii,e2,A)6e2xAo"e>iAe2) 

Proof. For 9^^,^^^^^ defined as above we have 

inf sup f giei,X,02)U-T{gi9i,X,02);P) (4.9) 

< sup inf /.g(0i,A,02)/a-T(g(0i,A,02);P). 

On the other hand, 

sup / g/„-r(g;i^sup \ [ 2- , . —pdx- [(- ) pdx+1 \ 

eu\e2J e,,\e2[J {l-X)fg,+Xrp J \{l - X)fe,+XrJ J 

= ^"P -/ ( n XU -^)\dx+ f fi^-l)\dx 

ei,A,e2 J \{^- >^).f0i + >^r pj J \p J 

-J {j~^ypd^=^'(fo'^p) 

and equality holds if {9i, 02, A) are such that ^ = (i-a)/o +\r (identifiability allows 
to find such (^i, 02, X) for every a € 9, and for every contaminated measure p). 
Also we have 

sup inf / gfa - T{g; P) 
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for some a* in 9„ 



We thus get 



sup inf /g/„-r(g;P)-x'(/a-,p) >x'(f^,P) =inf sup / g/a - T(.g; P), 
(Si.A.es) " " (ei,A,e2)J 

which, by (I4.9p . concludes the proof. □ 



Theorem 12.61 implies consistency of Xn an estimator of ^-nd convergence 
in distribution of ^Jn {xn ^ X^) to a normally distributed r.v. with mean zero and 
variance given by P (^{-g* - i?*^)^) - {P {-g* - under HI. 

The asymptotic distribution under the null hypothesis can be found subject to 
the choice of the parametric class {fa} and of the density r, as can be deduced 
by Theorem 3.5 in [9]. Following their Theorem 3.5, which holds for composite 
hypothesis testing in a parametric environment, the test statistic nXn converges 
weakly, under HO, to a chi-squared distribution with degrees of freedom depending 
on the dimension of the parameter space Q and on the cardinality of the constraints 
induced by P e fl. 

In the following, we focus on definition (|4.5|) for Xn(^i 

The null hypothesis reduces the space x Aq to x {0}. 

Theorem 3.5 in [S] implies that the degree of freedom d of the limiting chi-squared 
distribution equals the number of parameters of P under HO. In the following we 
assume d = 1 , as in Example 14.11 

Let h{e, A; a;) (1 - A)/e(a;) + Xr{x). 

Checking conditions (C.12)-(C.15) in [9] yields: 

Theorem 4.4. Under HO, with P = Pg^, assume that 

(i) The class of contaminated densities {h{0. A), 9 Cz Q, X (z Aq} is Pg^^ — identifiable; 

(ii) The class of functions | ^^g'^j , € 6^, A £ Aq, a € 6, |t^| < ej is Pe^ — GC 
for some e small enough; 

(iii) The densities fg are differentiable up to the second order in some neighbor- 
hood V{9o) of 6q and Fe{x) = fg{u)du is differentiable with respect to 
0; 

(iv) There exists a neighborhood V of{0o, 0, Oq, 0) such that, for every [6, A, a, v) S 
V we have 

j^<Hiix), ^^<H3ix), 
j^<H2ix), ^^<H4ix), 

where each of the functions Hj (j — l,2,3,4j is square integrable w.r. to 
the density h{a, v) and is in L/^lPg^). 

Then, nXn converges to a chi-squared distributed r.v. with degree of freedom equal 
to 1. 
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